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ABSTRACT 

The on-going transformation from the current US Air Traf- 
fic System (ATS) to the Next Generation Air Traffic System 
(NextGen) will force the introduction of new automated sys- 
tems and most likely will cause automation to migrate from 
ground to air. This will yield new function allocations be- 
tween humans and automation and therefore change the roles 
and responsibilities in the ATS. Yet, safety in NextGen is re- 
quired to be at least as good as in the current system. We 
therefore need techniques to evaluate the safety of the inter- 
actions between humans and automation. We think that cur- 
rent human factor studies and simulation-based techniques 
will fall short in front of the ATS complexity, and that we 
need to add more automated techniques to simulations, such 
as model checking, which offers exhaustive coverage of the 
non-deterministic behaviors in nominal and off-nominal sce- 
narios. In this work, we present a verification approach based 
both on simulations and on model checking for evaluating the 
roles and responsibilities of humans and automation. Mod- 
els are created using Brahms (a multi-agent framework) and 
we show that the traditional Brahms simulations can be inte- 
grated with automated exploration techniques based on model 
checking, thus offering a complete exploration of the behav- 
ioral space of the scenario. Our formal analysis supports the 
notion of beliefs and probabilities to reason about human be- 
havior. We demonstrate the technique with the Uberligen ac- 
cident since it exemplifies authority problems when receiving 
conflicting advices from human and automated systems. 
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General Terms 

Human Factors; Verification. 

INTRODUCTION 

Over the past few years, the US has embarked in a transfor- 
mation of the Air Transportation System (ATS) to address the 
expected increase of air traffic in the US. The original predic- 
tion is that the traffic in 2025 will be between two and three 
times greater than the current traffic. There is no consensus 
on the actual size, but everybody agrees that the US needs to 
modernize the ATS and to implement NexGen (Next Gener- 
ation Air Transportation System) to accommodate the traffic 
increase over the next 15 years; Europe is going through a 
similar effort with SESAR. An important goal of NexGen is 
to increase efficiency without compromising safety. The im- 
plementation of NexGen, described in the Integrated Working 
Plan (IWP), will see the introduction of new automated sys- 
tems (e.g., ADS-B, GPS-based navigation) and new air traf- 
fic paradigms (e.g., 4D trajectory), which will cause some Air 
Traffic Management (ATM) functions to migrate from ground 
to on-board, and possibly vice versa. The new automation 
will cause a change in function allocation as well as a change 
in roles and responsibilities for air traffic controllers and pi- 
lots. As a consequence, it poses new challenges in assessing 
the safety of the overall system. This is the focus of our work. 

The US National Airspace System (NAS) is currently quite 
safe and accidents are at a record low. NexGen needs to pro- 
vide at least the same, if not a better, level of safety. The 
NexGen IWP has a requirement (R-1440) that calls for new 
and improved verification and validation (V&V) techniques 
for complex systems. This is often understood as applying 
only to the software systems that will be used in NexGen. 
However, we also need to recognize that NexGen is a com- 
plex system in which humans and autonomy (or automation: 
we do not need to make a distinction in this work) are in- 
teracting in quite subtle ways. Therefore, we also need new 
safety evaluation techniques to verify and validate the inter- 
actions of humans and autonomy in the complex system that 
is NexGen. Moreover, there is a large consensus that the ear- 
lier in the lifecycle this V&V is done, the easier it is to de- 
tect and fix errors [5]. In our work, we are focusing on de- 
veloping methods for evaluating early, in the design phase, 
models of complex interactions in which there are multiple, 
different, simultaneous, situation-dependent assignments of 
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authority and autonomy (A&A) among humans and automa- 
tion. In order to ensure safety in NexGen, there is a need for 
well-defined formalizations of procedures, possible actions of 
the actors involved, and the consequences of the actions. The 
formalization should facilitate the analysis of various inter- 
actions between the actors; the analysis can either be simula- 
tions or formal search-based techniques such as model check- 
ing. We specifically need to develop methods that allow us to 
articulate the authority bounds (limits) and behavior in terms 
of ownership: who has authority in any situation and how 
it may affect safety. The definitions that we adopt simplified 
definitions for authority and autonomy in this work are as fol- 
lows: 

• Authority refers to having the right, or power, to exercise 
controls or issue air traffic commands that impact the posi- 
tion, velocity, and/or attitude of aircraft during operations. 

• Autonomy (or automation) refers to a function or system 
that can operate independently of pilot or air traffic con- 
troller intervention. 

The ATS, especially with future NexGen concepts of oper- 
ation, is a complex system involving dynamic interactions 
among multiple actors that are largely governed through for- 
mal assignment of roles and responsibilities. These A&A as- 
signments are made at the design level, but are executed at the 
operational level according to each actor’s view of its roles 
and responsibilities. Operationally, the system continuously 
adjusts for shortcomings in the assignment of authority and 
autonomy, for shortcomings in the capacity of actors to per- 
form their assigned roles and responsibilities, and to optimize 
various performance factors such as capacity, environmental 
impact, and safety. This suggests that system safety should 
be derived not only from a predictable execution of assigned 
roles and responsibilities but also from checks and balances to 
ensure that the system operates as designed in the face of fail- 
ures, disturbances and degradations. The ability of the system 
to operate in off-nominal conditions as a result of the checks 
and balances extent in it provides resilience, a critical charac- 
teristic for system safety. 

Assessing safety in human-automation systems can be done 
using several techniques. Historically, human-in-the-loop 
studies have been the most prominent ones [4, 20]. They are 
quite costly to perform (they are real-time studies in which 
humans interact with ATS simulations), somewhat limited in 
scale (it is difficult to pull many controllers into a study) and 
often incomplete in the sense that they can explore only a 
restrictive set of behaviors. A few user interface rules have 
been extracted from these studies and can be used as build- 
ing blocks to design the system interfaces. However, these 
rules fall short in helping in the context of a highly dynamic 
and complex system such as the ATS. Another way of an- 
alyzing human-machine systems is to create models for the 
humans (often based on the procedures that need to be per- 
formed by the humans) and run simulations [15, 18, 21] 
with the shortcoming that simulations can only examine a 
restricted set of behaviors. There has also been a growing 
interest and research in using formal methods for assessing 


safety in human-automation systems, particularly in the avia- 
tion domain. They have the potential of exploring all possible 
behaviors given an sufficiently complex model for the human 
and the systems. Early examples of the use of model check- 
ing for analyzing human-machine interactions are described 
in [8, 13, 22]. More recent examples try to bridge the gap 
between simulations and the use of model checking [3, 6, 7], 
The analysis method is model checking but the representa- 
tion of the problem (i.e., models for the human and the au- 
tomation) uses simulation languages instead of fairly simple 
finite state models. These techniques can even be expanded to 
the design and the verification of aerospace systems [2]. Our 
work falls in the category of using both simulation languages 
and formal methods. Our innovation resides in using a simu- 
lation language defined for representing multi-agent systems, 
which is what the ATS really is: a complex system of inter- 
acting agents some of which are humans and some of which 
are automated systems. But we also integrate the simulation 
language with formal verification techniques based on model 
checking. 

Concretely, we model systems in the Brahms multi-agent 
framework [11, 24], Brahms is a multi-agent simulation sys- 
tem in which people, tools, facilities, vehicles, and geogra- 
phy are modeled explicitly. The air transportation system 
is modeled as a collection of distributed, interactive subsys- 
tems such as airports, air-traffic control towers and personnel, 
aircraft, automated flight systems and air-traffic tools, instru- 
ments, and flight crew. Each subsystem, whether a person or 
a tool such as the radar, is modeled independently with prop- 
erties and contextual behaviors. Brahms facilitates modeling 
various configurable realistic scenarios that allows the anal- 
ysis of the airspace in various conditions and reassignment 
of roles and responsibilities among human and automation. 
We then apply formal methods to the proposed concepts and 
configurations early in the development process to identify 
promising candidates for safe solutions, as well as find de- 
sign problems when they are easier to fix. This combination 
of modeling and formal methods will increase assurance of 
safety and motivate adoption of advanced automation and as- 
sociated operations protocols. To motivate our approach we 
present a generalized air transportation system model based 
on the Uberlingen collision. 

The rest of the paper is organized as follows: to motivate 
our work we first describe the conditions that led to the 
Uberlingen collision. Then, we describe how humans and 
automation, as well as their interactions, are modeled. Fi- 
nally, we then present simulation results and a description the 
verification framework, and discuss related work. 

UBERLINGEN COLLISION OVERVIEW 

The Uberlingen accident, [1], involving the (automated) Traf- 
fic Collision Avoidance System (TCAS), is viewed as a very 
good representative example illustrating the problem of au- 
thority versus autonomy (A&A) [10]. The Uberlingen colli- 
sion is a paradigmatic example of A&A conflicts. In particu- 
lar, TCAS has the ability to reconfigure the pilot and air traffic 
control center (ATCC) relationship, taking authority from the 
air traffic control officer (ATCO) and instructing the pilot. 
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TCAS 

TCAS is an onboard aircraft system that uses radar transpon- 
der signals to operate independently of ground-based equip- 
ment to provide advice to the pilot about conflicting air- 
craft that are equipped with the same transponder/TCAS 
equipment. The history of TCAS dates at least to the late 
1950s. Motivated by a number of mid-air collisions over three 
decades, the United States Federal Aviation Administration 
(FAA) initiated the TCAS program in 1981. The system in 
use over Uberlingen in 2002 was TCAS II v.7, which had 
been installed by US carriers since 1994: 

TCAS II issues the following types of aural annunciations: 

• Traffic advisory (TA) 

• Resolution advisory (RA) 

• Clear of conflict 

When a TA is issued, pilots are instructed to initiate a visual 
search, if possible, for the traffic causing the TA. In the cases 
when the traffic can be visually acquired, pilots are instructed 
to maintain visual separation from the traffic. When an RA 
is issued, pilots are expected to respond immediately to the 
RA unless doing so would jeopardize the safe operation of 
the flight. The separation timing, called TAU, provides the 
TA alert at about 48 seconds and the RA at 35 seconds prior 
to a predicted collision. 

Uberlingen Collision Narrative 

On July 1 2002, a midair collision between a Tupolev Tu- 
154M passenger jet travelling from Moscow to Barcelona, 
and a Boeing 757-23 APF DHL cargo jet manned by two pi- 
lots, travelling from Bergamo to Brussels, occurred at 23:35 
UTC over the town of Uberlingen in southern Germany. The 
two flights were on a collision course. TCAS issued first a 
Traffic Advisory (TA) and then a Resolution Advisory (RA) 
for each plane. Just before TCAS RA to the Tupolev to climb, 
the air traffic controller in charge of the sector issued a com- 
mand to descend, which the crew obeyed. Since TCAS had 
issued a Resolution Advisory to the Boeing crew to descend 
that they immediately followed, both planes were descending 
when they collided. 

The decision of the Tupolev crew to follow the ATC’s instruc- 
tions rather than TCAS was the immediate cause of the acci- 
dent. The regulations for the use of TCAS state that in the 
case of conflicting instructions from TCAS and ATCO, the 
pilot should follow the TCAS instructions. The conflict in the 
Uberlingen scenario represents the conflict between the au- 
thority of automated systems (TCAS) and people (crews and 
ATC), as well as their autonomy (freedom to act indepen- 
dently). The reason this conflict came into being is because 
the loss of separation between the two planes was not detected 
or corrected by the ATCO. The loss of separation between air- 
planes are frequent occurrences; it is part of the normal work 
of air traffic control to detect and correct them accordingly. 

There were a set of complex systemic problems at the Zurich 
air traffic control station that caused the ATCO to miss de- 
tecting the loss of separation between the two planes. Al- 
though two controllers were supposed to be on duty, one of 


the two was resting in the lounge: a common and accepted 
practice during the lower workload portion of night shift. On 
this particular evening, a scheduled maintenance procedure 
was being carried out on the main radar system, which meant 
that the controller had to use a less capable air traffic tracking 
system. The maintenance work also disconnected the phone 
system, which made it impossible for other air traffic control 
centers in the area to alert the Zurich controller to the prob- 
lem. Finally, the controllers workload was increased by a late 
arriving plane. An A320 that was landing in Friedrichshafen 
required the ATCO’s attention, who then failed to notice the 
potential separation infringement of the two planes. 

The Uberlingen collision proves that methods used for cer- 
tifying TCAS II v7.0 did not adequately consider human- 
automation interactions. In particular, the certification 
method treated TCAS as if it were flight system automation, 
that is, a system that automatically controls the flight of the 
aircraft. Instead, TCAS is a system that tells pilot how to 
maneuver the aircraft, an instruction that implicitly removes 
and/or overrides the ATCs authority. Worldwide deployment 
of TCAS II v7.1 was still in process in 2012, a decade after 
the Uberlingen collision. 

MODELING THE UBERLINGEN WORK SYSTEM 
Overview of Brahms 

Brahms is a full-fledged multi-agent, rule-based, activity pro- 
gramming language. It is based on a theory of work prac- 
tice and situated cognitionk [11, 24]. The Brahms language 
allows for the representation of situated activities of agents 
in a geographical model of the world. Situated activities are 
actions performed by the agent in some physical and social 
context for a specified period of time [9]. The execution of 
actions is constrained (a) locally: by the reasoning capabili- 
ties of an agent and (b) globally by the agents beliefs of the 
external world, such as where the agent is located, the state of 
the world at that location and elsewhere, located artifacts, ac- 
tivities of other agents, and communication with other agents 
or artifacts. The objective of Brahms is to represent the in- 
teraction between people, off-task behaviors, multi-tasking, 
interrupted and resumed activities, informal interactions and 
knowledge, while being located in some environment repre- 
sentative of the real world. 

The Brahms agent language can also be used to develop ex- 
ecutable software agents that are based on models of situ- 
ated behavior. This allows for the development of intelligent 
agents that can act and react to specific situations that oc- 
cur during its execution, and that have been modeled as the 
agent’s activity-behavior. 

At each clock tick the Brahms simulation engine inspects the 
model to update the state of the world, which includes all 
of the agents and all of the objects in the simulated world. 
Agents and objects have states (factual properties) and may 
have capabilities to model the world (e.g., a radar’s display 
is modeled as beliefs, which are representations of the state 
of the aircraft). Agents and objects communicate with each 
other; the communications can represent verbal speech, read- 
ing, writing, etc. and may involve devices such as telephones. 
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Figure 1. A simplified overview of the agents, objects, classes in the Brahms Uberligen Model involving communications 


radios, displays, etc. Agents and objects may act to change 
their own state, beliefs, or other facts about the world. 


Constructs in the Brahms Uberlingen Model 

In a Brahms model, the system being modeled is the entire 
work system, including agents, groups to which they belong, 
facilities (buildings, rooms, offices, spaces in vehicles), tools 
(e.g., radio, radar display/workstation, telephone, vehicles), 
representational objects (e.g., a phone book, a control strip), 
and automated subsystems (e.g., TCAS), all located in an ab- 
stracted geography represented as areas and paths. Thus the 
notion of human-system interaction in Brahms terms is more 
precisely an interaction between an agent and a subsystem in 
the model; both are behaving within the work system. 

A workframe in Brahms can model the interaction between an 
agent’s beliefs, perception, and action in a dynamic environ- 
ment, for example, these characteristics are leveraged when 
modeling how a pilot deploys the aircraft landing gear. A pi- 
lot uses the on-board landing control and then confirms that 
the landing gears are deployed while monitoring the aircraft’s 
trajectory on the Primary Flight Display. This is modeled in 
Brahms as follows: a pilot (e.g., the DHL pilot) is a member 
of the PilotGroup, which has a composite activity for man- 
aging aircraft energy configuration. For further details about 
how the different Brahms constructs are used to model the 
various aspects of the Uberlingen collision we refer the reader 
to our technical report [10]. 


A specific instance of a conceptual class is called a concep- 
tual object. A particular flight (e.g., DHX611, a conceptual 
object) is operated by a particular airline and consists of a 
particular crew (a group) of pilots (agents) who file a particu- 
lar flight plan document (an object), and so on. Each instance 
of an agent and object have possible actions defined by work- 
frames where each workframe contains a set of activities that 
are ordered and often prioritized. Certain workframes are in- 
herited from their group (for agents) or class (for objects). 
The set of possible actions are modeled at a general level 
and all members of a group/class have similar capabilities 
(represented as activities, workframes, and thoughtframes); 
however, at any time during the simulation, agent and object 
behaviors, beliefs, and facts about them will vary depending 
on their initial beliefs/facts and the environment with which 
they are interacting. The model incorporatesorganizational 
and regulatory aspects implicitly, manifest by how work prac- 
tices relate roles, tools, and facilities. 

A Brahms simulation model configuration consists of the 
modeled geography, agents, and objects, as well as their ini- 
tial facts and beliefs of agents and objects. The different 
configurations allow us to perform a what-if analysis on the 
model. The time of departure for a flight might be an ini- 
tial fact in a Brahms model. One can modify the model to 
assign a different time of departure for a flight in each simu- 
lation run. Another example of configurable initial facts may 
include work schedules for air traffic controllers. In one con- 
figuration of the work schedules an air traffic controller may 
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be working alone in the ATCC, while in another configura- 
tion, two controllers would be present in the ATCC. Initial 
beliefs of an agent might be broad preferences affecting be- 
havior (e.g., TCAS should overrule the ATC), thus initial be- 
liefs can be used as switches to easily specify alternative con- 
figurations of interest. Alternative configurations are conven- 
tionally called scenarios. Thus for example, a scenario might 
be a variation of the Uberlingen collision in which two air- 
craft have inter-route flight times that put them on an inter- 
secting path over Uberlingen; the only other flight is a late 
arriving flight for Friedrichshafen and maintenance degrades 
the radar, but the telephones are operative. 

In general, a model is designed by the model builder with 
sufficient flexibility to allow investigating scenarios of inter- 
est. The set of causal factors of interest (e.g., use of control 
strips when approving aircraft altitude changes, availability 
of telephones) constitute states of the world and behaviors 
that can be configured through initial facts and beliefs. The 
initial settings define a space of scenarios. Using Brahms to 
evaluate designs within this space, while using formal meth- 
ods to help modelers understand its boundaries so they can 
refine the model to explore alternative scenarios, constitutes 
the main research objective of this work. 

The simulation engine determines the state of a modeled ob- 
ject (e.g., aircraft). It determines the state of its facts and 
beliefs. Some objects are not physical things in the world, 
but rather conceptual entities, called conceptual classes in the 
Brahms language. These represent processes, a set of people, 
physical objects, and locations (e.g., flights), and institutional 
systems (e.g., airlines) that people know about and refer to 
when organizing their work activities. 

High-level Structure of the Brahms Uberlingen model 

An overview of the agents, objects, classes in the Brahms 
Uberlingen model are shown in Figure. 1. All of the systems 
that are mentioned in the BFU Report, [1], and play a role in 
accident have been modeled; a partial list follows: 

1. Agents: 

(a) Pilots in each aircraft 

(b) Two ATCOs at Zurich 

2. Geography: 

(a) Airports: Moscow, Bergamo, Barcelona, Brussels 

(b) Control Centers at Zurich and Karlsruhe that includes 
layout of physical workstations 

(c) Aircraft interior layout 

3. Objects: 

(a) Aircraft: DHL, BTC, AEF, and other aircraft in the 
sector during the simulated time period (flights are 
conceptual objects associated with these). 

(b) Flight Management Computer (FMC) with Cruise & 
Standard Terminal Arrival Route (STAR) modes for 
DHL & BTC 


(c) Control center workstations including radio frequen- 
cies and sectors. 

4. Activities: 

(a) Flight Take-off Phase: Clock in ATCC announcing 
time for departure ATCO communicates departure ap- 
proval; FMC guides with Standard Instrument Depar- 
ture; Pilot activities and communications. 

(b) Flight Cruise Phase: FMC flying in auto-pilot mode 
using flight plan; Pilot activities and communications. 

(c) Flight Phase: Pilot activities and communications; 
ATCOs handoff and accept flights. 

(d) ATCOs handoff and accept flights 

(e) Flight Landing Phase: Pilot requests permission to 
land and ATCO communicates approval; FMC guides 
with Standard Terminal Arrival Route; Pilot activities 
and communications. 

Key Subsystems and Conditions 

The following key subsystems and conditions are modeled in 

the Brahms Oberlingen model: 

1 . Interactions among Pilot, Flight Systems, and Aircraft for 
climb and cruise with European geography for one plane, 
the DHL flight plan. 

2. BTC flight, flight plan (two versions: on-time and de- 
layed with collision) and geography - this is independent 
of ATCO actions, to confirm that simulation reproduces 
collision with flight paths actually flown. 

3. Radar Systems and Displays with ATCOs, located in Con- 
trol Centers, monitoring when flights are entering and ex- 
iting each European flight sector in flight plans. 

4. Handover interactions between Pilot and ATCOs for each 
flight phase. 

5. Two ATCOs in Zurich, Radar Planner (RP), and AR- 
FARadar Executive (RE), assigned to two workstations 
(RE has nothing to do under these conditions). 

6. Add TCAS with capability to detect separation violations, 
generate Traffic Advisory (TA) and Resolution Advisory 
(RA). DHL and BTC are delayed (on collision course, 
which tests TCAS) 

7. Pilots follow TCAS instructions 

8. ATCO may intervene prior to alert depending on when 
ATCO notices conflict in Radar Displays since ATCO is 
busy communicating with other flights, moving between 
workstations, and trying to contact Friedrichshafen control 
tower on the phone. 

9. AEF flight and flight plan so Zurich ARFA RE performs 
landing handoff to Friedrichshafen controller. 

10. Third plane, the AEF flight, arrives late, requiring ATCO 
communications and handoff to Friedrichshafen: (a) Han- 
dled by ATCO in Zurich at right workstation (ARFA sec- 
tor) and not left East and South sector workstation, (b) 


5 


Phone communications for handovers, (c) Methods used 
by ATCO when phone contact does not work: 

(a) Ask Controller Assistant (CA) to get another number 
(pass-nr); requires about 3 minutes for CA to return 

(b) After pass-nr fails, discuss with CA other options 
about 30 sec 

(c) When not busy handling other flights, try pass-nr 
again. 

(d) When plane is at Top-Of-Descent waypoint, as spec- 
ified in STAR, for landing at airport, within N nm of 
airport, method of last resort is to call pilots on radio 
and ask them to contact the tower directly 

11. STCA added to ATCO workstations (modeling normal and 
fallback mode without optical alert). The ATCO responds 
to alert by advising Pilot to change flight level based on 
next flight segment of flight plan. 

12. Reduce to one Zurich ATCO which triggers the sequence 
of variations from the nominal situation; now Zurich 
ATCO must operate flights from two workstations. 

Note that fig:key does not show the geography, facilities, and 
flights. 

PROPERTIES OF INTEREST 

The question that the analysis tries to answer, using both sim- 
ulation and verification, is why under certain conditions, a 
collision is averted, while in others it is not? In the analysis 
we try to gauge how the temporal sensitivity and variability of 
the interactions among ATCO, TCAS, and the pilots impacts 
the potential loss of separation and collision of the planes. 
Concretely, the questions that we ask during the analysis are: 

• Given that the arrival of the AEF flight is disrupting the 
ATCOs monitoring of the larger airspace (e.g., if it arrives 
sufficiently late, no collision occurs), what is the period 
(relative to the BTC and DHL flights paths) when AEF’s 
arrival can cause collision? 

• During this period, does a collision always occur or are 
there variations of how the AEF handoff occurs, such that 
sometimes the separation infringement is averted? 

• Is there evidence that high-priority activities such as moni- 
toring the sector are repeatedly interrupted or deferred, im- 
plying the ATCO is unable to cope with the workload? 

SIMULATION OF THE UBERLINGEN SCENARIOS 

The Brahms Uberlingen Model defines a space of work sys- 
tems (e.g., is STCA optical functioning? are there two AT- 
COs?) and events (e.g., the aircraft and flights). Every con- 
figuration of model, which involves configuring initial facts, 
beliefs, and agent/object relations, constitutes a scenario that 
can be simulated and will itself produce many different out- 
comes (chronology of events), because of non-deterministic 
timings of agent and object behaviors. The model was de- 
veloped and tested with a variety of scenarios (e.g., varying 
additional flights in the sector; all subsystems are working 
properly). The Uberlingen accident is of special interest, in 


which systems are configured as they were at the time of the 
accident and the DHL and BTC planes are on intersecting 
routes. 

Setting up the Simulation 

The key events that occur during simulation are logged 
chronologically in a file that constitutes a readable trace of the 
interactions among the ATCO, pilots, and automated systems. 
The log includes information about the following: (a) ATCO- 
pilot interaction regarding a route change, including flight 
level and climb/descend instruction, (b) Separation violation 
events detected by TCAS, including TAU value, (c) Closest 
aircraft and separation detected by ATCO when monitoring 
radar, (d) STCA optical or aural alerts, including separation 
detected, (e) Agent movements (e.g., ATCO shifting between 
workstations), (f) Aircraft movements, including departure, 
entering and exiting sectors, waypoint arrival, landing, col- 
lision, airspeeds, vertical, etc., (g) Aircraft control changes 
(e.g., autopilot disengaged), (h) Radio calls, including com- 
municated beliefs, (i) Phone calls that fail to complete. 

Summary of Results 

The outcome of ten simulation runs of Brahms Uberlingen 
model configured for the collision scenario are shown Fig- 
ure. 2. In the simulation runs 1, 2, and 3 shown in Figure. 2, 
the ATCO intervenes before TCAS TA, but planes have not 
separated sufficiently, TCAS will take BTCs descent into ac- 
count, advising DHL to climb. In the simulation runs 4, 5, 7, 
8, and 9, the ATCO intervenes between TA and RA. In these 
runs whether the planes collide depends on timing. As shown 
in Figure 2 two of the five runs results in a collision. Note 
that in our model a collision is defined as occurring when the 
vertical separation between the planes is less than a 100 feet. 
Finally, in the simulation runs 6 and 10, the ATCO intervenes 
about 10 seconds after TCAS RA — which BTC pilots ignore 
(or might be imagined as discussing for a long time) — BTC 
continues flying level while DHL descends, so they miss each 
other, separated by more than 600 ft at the crossing point. In 
other runs, we have also observed that ATCO intervenes so 
late, he actually takes the pilots’ report about TCAS RA in- 
structions into account. 

When ATCO intervenes in the period between the TA and RA 
in runs 4, 5, 7, 8, and 9 collision is possible, as at Uberlingen. 
That is, ATCO has to intervene before TA advising BTC de- 
scent for BTC to respond sufficiently for TCAS to advise 
DHL to climb. In runs 4 and 7, collision is narrowly averted 
because BTC begins to descend four or five seconds after the 
TCAS RA, which is sufficient for a narrow miss (just over 
100 feet). In run 9 the BTC descent begins 5 seconds before 
the RA, hence the aircraft miss by more than 200 feet). Runs 
5 and 8 (Figure 9-3) lead to collision because the TCAS RA 
and BTC AP disengage occur at the same time, as happened 
at Uberlingen. Because the model uses the Uberlingen de- 
scent tables to control the BTC and DHL aircraft during the 
emergency descent, simulation matches the paths of the air- 
craft at Uberlingen guaranteeing a collision (within defined 
range of error). In both cases, TCAS didn’t instruct DHL to 
climb because BTC was above DHL at that time and of course 
had not begun its descent. 
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Run 

# 

Collide? 

Explanation 

ATCO- 

BTC 

TCAS 

RA- 

DHL 

ATCO 

relative 

TA/RA 

1 

No 

TCAS detects BTC plane descending due 
to ATCO; so advises DHL to Climb. 

Descend 

Climb 

Before 

2 

No 

TCAS detects BTC plane descending due 
to ATCO, so advises DHL to Climb. 

Descend 

Climb 

Before 

3 

No 

TCAS detects BTC plane descending due 
to ATCO, so advises DHL to Climb. AEF 
flight arrives very late after TCAS TA. 

Descend 

Climb 

Before 

4 

No 

DHL TCAS Descend; BTC above. Planes 
crossed > 100 ft vertical separation 

Descend 

Descend 

During 

5 

YES 

DHL TCAS Descend; BTC above. BTC 
AP turned off at DHL RA. Planes crossed 
< 20 ft vertical separation 

Descend 

Descend 

During 

6 

No 

DHL TCAS Descend; BTC above. ATCO 
later than RA, so BTC level. Planes 
crossed > 600 ft vertical separation 

Descend 

Descend 

After 

7 

No 

DHL TCAS Descend; BTC above. DHL 
AP turned off 2 seconds before BTC. 
Planes crossed > 1 00 ft vertical separation 

Descend 

Descend 

During 

8 

YES 

DHL TCAS Descend; BTC above. BTC 
AP turned off at RA. Planes crossed < 50 
feet vertical separation 

Descend 

Descend 

During 

9 

No 

DHL TCAS Descend; BTC above. Planes 
crossed > 200 ft vertical separation 

Descend 

Descend 

During 

10 

No 

DHL TCAS Descend; BTC above. ATCO 
later than RA, so BTC level. Planes 
crossed > 600 ft vertical separation 

Descend 

Descend 

After 


Figure 2. Outcomes of ten simulation runs of Uberlingen scenario. Bold indicates greatest potential for collision (ATCO intervenes between TA and 
RA; both aircraft descending). 


When ATCO intervenes after the RA, the BTC pilots in the 
simulations ignore the RA advice and continue level flight, 
which itself averts the collision — even though ATCO advises 
BTC to descend (which implies not considering that DHL is 
below them). We of course do not know what the BTC pilots 
would have done if ATCO had not intervened. With more 
than one pilot interpreting TCAS correctly, it appears possible 
the BTC would have climbed. 

The final AEF hand-off (directing the pilots to contact the 
tower) always occurs in the simulation after the TCAS RA; 
at Uberlingen it occurred prior to the TA. This discrepancy 
raises many questions about what variability is desirable. In 
the verification of the system we were able to find certain 
cases where the final AEF hand-off occurs before the TCAS 
TA and the planes still collide. 

The simulation results for other configurations of the Brahms 
Uberlingen model are described in the technical report [10]. 

FORMAL VERIFICATION 

We use verification techniques to systematically explore 
the various behaviors in collision scenario of the Brahms 
Uberlingen model configuration in addition to the simulation 
experiments. 

Background 

In [16] we present an extensible verification framework that 
takes as input a multi-agent system model and its seman- 
tics as input to some state space search engine (or a model 
checker). The search engine generates all possible behaviors 


of the model with respect to its semantics. The generated be- 
haviors of the model are then encoded as a reachability graph 
G := ( N,E ) where N is a set of nodes and E is a set of 
edges. This graph is automatically generated by the search 
engine. Each node n £ N is labeled with the belief/facts 
values of the agents and objects. In the work in [16] we gen- 
erate the reachability graph using the Java Pathfinder byte- 
code analysis framework. An edge between the nodes rep- 
resents the updates to beliefs/facts and is also labelled with 
probabilities. The reachable states generated by the JPF are 
mapped to the nodes in a reachability graph. The verification 
of safety properties and other reachability properties is per- 
formed on-the-fly as new states and transitions are generated 
in JPF. Additional verification activities can be performed on 
the reachability graph after all the JPF states have been gen- 
erated. 

Limitations The JPF-based MAS connector requires a com- 
plete implementation of the Brahms semantics to generate the 
intermediate representation. The current implementation of 
the Brahms semantics presented in [16] only supports a lim- 
ited set of constructs. Furthermore, JPF is a stateful analysis 
engine that stores the generated model in memory. Captur- 
ing the state of all the agents and objects in Brahms including 
their workframes and thoughtframes can lead to large mem- 
ory requirements. Additionally, for large systems it is often 
intractable to generate and capture even just the intermediate 
representation in memory. 

Stateless Brahms Model Checking 
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To overcome the limitations just described, in this work we 
adopt a stateless model checking approach. Stateless model 
checking explores all possible behaviors of the program or 
model without storing the explored states in a visited set. The 
program or model is executed by a scheduler that tracks all 
the points of non-determinism in the program. The sched- 
uler systematically explores all possible execution paths of 
the program obtained by the non-deterministic choices. State- 
less model checking is particularly suited for exploring the 
state space of large models. In this work we instrument the 
Brahms simulator to perform stateless model checking. The 
instrumented code within the Brahms engine generates all 
possible paths (each with different combinations of activity 
durations) in depth-first ordering. Stateless model checkers 
like VeriSoft [14] do not in general store paths; however, in 
order to perform further analysis of the behaviors space the 
Brahms stateless model checker can store all the generated 
paths in a database. 

Non-determinism in Brahms 

There are two main points of non-determinism in Brahms 
models. The first point of non-determinism is due to dura- 
tions of primitive activities. The different primitive activi- 
ties in Brahms have a duration in seconds associated with 
them. The duration of the primitive activity can either be 
fixed or can vary based on certain attributes of the primitive 
activities. When the random attribute of a primitive activ- 
ity is set to true the simulator randomly selects the primitive 
activity duration between the min and max durations speci- 
fied for the activity. The second point of non-determinism 
arises from probabilistic updates to facts and beliefs of agents 
and objects. Updates to facts and beliefs are made using 
conclude statements in Brahms. An example of a con- 
clude statement is: conclude ( (Pilot . checkStall 

= false), bc:70, f c : 7 0 ). This states that the belief 
and fact, checkStall, in the Pilot agent will be updated to false 
with a probability of 70%. Here be represents belief certainty 
while f c represents fact certainty. 

In the Uberlingen model currently there are only determinis- 
tic updates to facts or beliefs. The updates to facts and beliefs 
are asserted with a 100% probability. Nevertheless, there is a 
large degree of non-determinism due to variations in activity 
durations. The difference in minimum and maximum dura- 
tion ranges from 2 seconds to a few hundred seconds. This 
can potentially lead to a large number of timing differences 
between the various events. In future work we plan to extend 
the Brahms Uberlingen model to support probabilistic varia- 
tions in order to account for errors by humans and automated 
systems. 

Behavior Space 

The scheduler within the stateless Brahms model checker 
generates all possible paths through the different points of 
non-determinism in the Brahms model. Note that in describ- 
ing the output of the Brahms stateless model checker we use 
the terms path and trace interchangeably. Intuitively, a path 
(or trace) generated by the Brahms stateless model checker 
is equivalent to the a single simulation run. More formally, 
a path or trace is a sequence of events executed by the sim- 


ulator (eo, ei, 62 , ■ ■ ■ , e,). Each event in the trace is a tuple, 
(a, t , (u, vat)) where a is the actor id, t is the Brahms clock 
time, u is the fact or belief updated to the value val. For 
each trace we generate a sequence of nodes in the intermedi- 
ate representation n™t, no, ni, ri 2 , . . . , n,. The initial node 
in the sequence, rimit is labeled with the initial values of be- 
lief/facts values for the various agents and objects. The event 
eo := (ao,to, (uq, valg)) is applied to the initial node n lnlt 
where the value assigned to uq is updated to valo- Each event 
is applied in sequence to a node in the intermediate represen- 
tation to generate n mlt , n 0 , ni,n 2 , .... tv 

Summary of the Results 

There are several activities in the Brahms Uberlingen model 
with a specified range of minimum and maximum durations. 
Due to the size and complexity of the model, generating a 
single trace takes approximately 15 minutes. It would in all 
likelihood take a few weeks to generate all possible traces 
within the system. In order to mitigate this computational 
bottleneck, we scope the verification of the model. We non- 
deterministic ally explore the minimum, median, and maxi- 
mum durations for each activity in the model. In the traces 
generated by the stateless Brahms model checker, approxi- 
mately a third of the generated traces lead to a collision. If 
the collision was an undesired property (a fault) in the model, 
then the results of the model checking would indicate a very 
high error density. It is, however, important to note that the 
goal of the collision configuration in the Brahms Uberlingen 
model was to faithfully recreate the conditions that led to the 
planes colliding. The verification results demonstrate that 
even with the timing variations a large number of paths (one- 
third of the generated paths) lead to the collision due to the 
fact that the ATCO was distracted with the AEF flight, the 
short-term collision avoidance system (STCA) which pro- 
vides optical and audible alerts for the ATCO was under 
maintenance, and the fact that there was only one ATCO on 
duty. 

We present an overview of the verification results for the two 
properties of interest described earlier: (a) how does the ar- 
rival of the AEF flight impact the ATCO’s ability to monitor 
the large airspace and (b) does a collision always occur in this 
period? Some of the results described in the simulation also 
hold true for the traces generated during the verification. We 
were able to study other interesting aspects of the model with 
respect to the properties that were not observed in the sim- 
ulation. In the simulation runs of the Uberlingen model the 
final AEF handoff (directing the pilots to contact the tower) 
always occurs in the simulation after the TCAS RA; in the 
verification runs, however, the final AEF hand-off can occur 
before the TA is ever issued. Some of the cases observed are 
as follows: 

1 . The final AEF hand off occurs before the TA, the separa- 
tion infringement is detected and resolved. 

2. The final AEF hand off occurs before the TA, and the 
planes still collide. Note that this is a very interesting sce- 
nario because the Uberlingen accident report states that the 
the final AEF hand off occurred before the TA for either of 
the planes. 


8 


The verification results indicate that while in some cases the 
AEF flight arrival can exacerbate the problem for the ATCO, 
it is not the only cause of the accident. From a wider sys- 
temic perspective, the separation violation did not occur at 
Uberligen only because of the arrival time of the AEF flight. 
Rather, the Skyguide company had tolerated a deviant form 
of SMOP during night operations: consequently nobody was 
carrying out the role of the supervisor in the ATC. Nobody 
was responsible for the system, particularly during the main- 
tenance process. Otherwise ATCO would have been informed 
that STCA Optical alert was not functioning and that the 
backup phones had been disabled. We can encode the out- 
put of the Brahms Stateless model checker into a PRISM 
model, [17], and check various probabilistic properties of the 
system. The updates to the facts and beliefs represent the 
probabilistic updates to the system. Note that the output of 
the Brahms stateless model checker can be encoded as the 
intermediate representation in the work in [16]. 

The Uberlingen collision scenario does not provide oppor- 
tunity for sophisticated properties since a large number of 
paths lead to the collision of the planes. The model, however, 
lends itself to be extended to other general cases and scenarios 
present in the aviation domain. For example, most pilots in 
practice commonly ignore the TA alert issues by TCAS, but 
are trained to react immediately to an RA. Rather than being 
specified as initial configuration we can extend the model to 
support probabilistic updates that indicate whether or not the 
phones are down, whether the STCA is in maintenance, and 
the other ATCO officer is on a break. 

DISCUSSION 

Our overall goal is to model and analyze interactions between 
humans and automated systems, and apply this methodology 
to the safety analysis of NextGen. It is our conjecture that 
such an analysis needs to be done early in design before de- 
ploying any new automation. The problems that are related 
to safety which can be detected early in the design phase are 
easier to fix. In order to achieve this goal, we need to reason 
about how humans perform their tasks in conjunction with 
complex, thus hard to grasp in its entirety, automation. It led 
us to making the following choices. 

To model the interactions between humans and automation 
we chose the Brahms modeling language. Brahms has the 
ability to reason about agents and objects that can represent 
humans as well as automated systems. The agents can have 
varying levels of intelligence which provides us the flexibil- 
ity to model agents at varying granularity. The simulation of 
agent behavior can range from rational procedure following 
to simulating how people actually doing their work, i.e., their 
practice, or simulating reactive behavior that is fragmented, 
unfocused, incomplete, etc. We can encode non-deterministic 
choices in the model and even assign probabilities to these 
choices. We can also express the notion of belief, which is 
quite important when a human interacts with a complex sys- 
tem. For example, the pilot in charge during the Air France 
447 accident described in [16] had wrong assumptions about 
the pitch of the plane and being able to model his belief as to 
the state of the system is important. Brahms also gives us the 


added benefit of being able to model precisely a working en- 
vironment (e.g., a controller console is two yards away from 
another one, which implies some time is needed to switch 
from one to the other). Early in the design phase, details, in- 
cluding those about work settings, are not necessarily known. 
However, it is advantageous to have this feature when one 
wants to refine the analysis as one gets closer to deployment 
at particular locations. Quite often FAA ground systems re- 
quire adaptations when they are fielded at new locations. Us- 
ing Brahms’ capabilities, one can tune a generic model to the 
details of a particular location and verify that no new safety 
issues can appear. 

From an analysis point of view, we are taking a pragmatic ap- 
proach, adopting both simulations and model checking tech- 
niques. In this work, we experimented with generating the 
behaviors using a stateless model checker. The reachability 
graph can also be generated by the JPF model checker as de- 
scribed in [ 16] where currently a subset of the Brahms seman- 
tics are implemented as a Java library. After completing the 
implementation of the Brahms semantics in JPF we can lever- 
age the several extensions of JPF to facilitate the scalability 
of the analysis. Another important criterion is to provide the 
ability to reason about beliefs and probabilistic behaviors. As 
described in the verification framework of [16], we can en- 
code the reachability graph into inputs for different model 
checkers such as PRISM, SPIN, and NuSMV. This allows us 
to leverage state of the art verification technologies and check 
properties related to probabilities, liveness, and beliefs. 

With respect to the methodology, it is important to reconcile 
the need for details in an analysis with the fact of perform- 
ing an analysis early in design when details are not necessar- 
ily known. Our first answer is based on using fairly generic 
models for controllers and pilots based on the current litera- 
ture (which includes the body of work in human factor stud- 
ies). These generic models can then be refined as we progress 
towards implementation, adaptation, and finally deployment. 
Scalability might still be an issue, and we will address it by 
using proper abstractions and, if possible, compositional ver- 
ification techniques. Similarly, we will have to address the 
scalability of the models when it comes to analyzing larger 
parts of the National Airspace System (e.g., multiple airports, 
multiple sectors, many airplanes). Fortunately JPF is being 
extended with capabilities to address abstractions and com- 
positional verification. So, we will have a good base from 
which to draw. 

RELATED WORK 

In addition to the approaches mentioned in the Introduction, 
there is a large body of work dealing with the verification 
of human-machine interactions and with the verification of 
avionic systems. The DO-178B titled Software considera- 
tions in airborne systems and equipment certification is the 
official guideline for certifying avionics software. Several 
model checking and formal verification techniques have been 
employed to verify avionic software in [19, 2] in accordance 
with the DO-178B. Recent work describes how changes in 
aircraft systems and in the air traffic system pose new chal- 
lenges for certification, due to the increased interaction and 
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integration [23]. 

In [19] the authors present a framework that supports mul- 
tiple input formalisms to model avionic software: these in- 
clude MATLAB Simulink/Stateflow and SCADE. These for- 
malisms are then translated into an intermediate representa- 
tion using Lustre, a standard modelling language employed to 
model reactive systems with applications in avionics. Finally, 
Lustre models are translated to the input language of various 
model checkers, including NuSMV, PVS, and SAL. The key 
difference with the approach we describe for formal verifica- 
tion is that the translation is purely syntactical. In our work, 
instead, we do not translate the modelling language, but we 
operate at the level of the Brahms simulator. This allows us to 
consider the full semantics of Brahms, and not a subset of the 
language compatible with the verification tools. More impor- 
tantly, we explicitly consider a hybrid system composed of 
software and humans, and we are able to reason about beliefs 
and probabilities, while the work in [19] is limited to tempo- 
ral properties. 

There is a vast literature to model human-machine interac- 
tions. Recently, Combefis et al. [12] have employed lava 
Pathfinder as a model checker to verify human-machine in- 
teractions. The modelling language is based on Statecharts 
but, as in the work of [19], this formalism does not allow 
us to reason about probabilities or beliefs. We refer to the 
references available in [12] for an overview of other similar 
approaches. 

The work of Yasmeen and Gunter [25] deals with the verifi- 
cation of the behaviour of human operators to check the ro- 
bustness of mixed systems. In this approach the authors em- 
ploy concurrent game structures as the modelling language 
and translate the verification problem to a model checking in- 
stance using SPIN. As in the previous cases, our approach is 
different in that we do not perform syntactic translations and 
we reason explicitly about probabilities and beliefs. Addi- 
tionally, we also provide a detailed and complex case study. 

The Enhanced Operator Function Model (EOFM) is another 
modelling language developed to model and verify interac- 
tions between humans and automated systems [7]. Similarly 
to the other works described above, EOFM is translated into 
the input language of the model checker SAL to perform ver- 
ification of properties encoded in linear temporal logic. The 
authors describe the application of their framework to the ver- 
ification of a cruise control system for cars. The main lim- 
itation of this approach is that it currently supports single- 
operator systems only and, as in the case of [19] and [25], 
there is no support to reason about probabilities and beliefs. 
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